238 research outputs found

    DYNAMO-MAS: a multi-agent system for ontology evolution from text

    Get PDF
    International audienceManual ontology development and evolution are complex and time-consuming tasks, even when textual documents are used as knowledge sources in addition to human expertise or existing ontologies. Processing natural language in text produces huge amounts of linguistic data that need to be filtered out and structured. To support both of these tasks, we have developed DYNAMO-MAS, an interactive tool based on an adaptive multi-agent system (adaptive MAS or AMAS) that builds and evolves ontologies from text. DYNA-MO-MAS is a partner system to build ontologies; the ontologist interacts with the system to validate or modify its outputs. This paper presents the architecture of DYNAMO-MAS, its operating principles and its evaluation on three case studies

    Using Social Media to Promote STEM Education: Matching College Students with Role Models

    Full text link
    STEM (Science, Technology, Engineering, and Mathematics) fields have become increasingly central to U.S. economic competitiveness and growth. The shortage in the STEM workforce has brought promoting STEM education upfront. The rapid growth of social media usage provides a unique opportunity to predict users' real-life identities and interests from online texts and photos. In this paper, we propose an innovative approach by leveraging social media to promote STEM education: matching Twitter college student users with diverse LinkedIn STEM professionals using a ranking algorithm based on the similarities of their demographics and interests. We share the belief that increasing STEM presence in the form of introducing career role models who share similar interests and demographics will inspire students to develop interests in STEM related fields and emulate their models. Our evaluation on 2,000 real college students demonstrated the accuracy of our ranking algorithm. We also design a novel implementation that recommends matched role models to the students.Comment: 16 pages, 8 figures, accepted by ECML/PKDD 2016, Industrial Trac

    Improving translation memory matching and retrieval using paraphrases

    Get PDF
    This is an accepted manuscript of an article published by Springer Nature in Machine Translation on 02/11/2016, available online: https://doi.org/10.1007/s10590-016-9180-0 The accepted version of the publication may differ from the final published version.Most of the current Translation Memory (TM) systems work on string level (character or word level) and lack semantic knowledge while matching. They use simple edit-distance calculated on surface-form or some variation on it (stem, lemma), which does not take into consideration any semantic aspects in matching. This paper presents a novel and efficient approach to incorporating semantic information in the form of paraphrasing in the edit-distance metric. The approach computes edit-distance while efficiently considering paraphrases using dynamic programming and greedy approximation. In addition to using automatic evaluation metrics like BLEU and METEOR, we have carried out an extensive human evaluation in which we measured post-editing time, keystrokes, HTER, HMETEOR, and carried out three rounds of subjective evaluations. Our results show that paraphrasing substantially improves TM matching and retrieval, resulting in translation performance increases when translators use paraphrase-enhanced TMs

    The first Automatic Translation Memory Cleaning Shared Task

    Get PDF
    This is an accepted manuscript of an article published by Springer in Machine Translation on 21/01/2017, available online: https://doi.org/10.1007/s10590-016-9183-x The accepted version of the publication may differ from the final published version.This paper reports on the organization and results of the rst Automatic Translation Memory Cleaning Shared Task. This shared task is aimed at nding automatic ways of cleaning translation memories (TMs) that have not been properly curated and thus include incorrect translations. As a follow up of the shared task, we also conducted two surveys, one targeting the teams participating in the shared task, and the other one targeting professional translators. While the researchers-oriented survey aimed at gathering information about the opinion of participants on the shared task, the translators-oriented survey aimed to better understand what constitutes a good TM unit and inform decisions that will be taken in future editions of the task. In this paper, we report on the process of data preparation and the evaluation of the automatic systems submitted, as well as on the results of the collected surveys

    Fifty years of spellchecking

    Get PDF
    A short history of spellchecking from the late 1950s to the present day, describing its development through dictionary lookup, affix stripping, correction, confusion sets, and edit distance to the use of gigantic databases

    Classifying Candidate Axioms via Dimensionality Reduction Techniques

    Get PDF
    We assess the role of similarity measures and learning methods in classifying candidate axioms for automated schema induction through kernel-based learning algorithms. The evaluation is based on (i) three different similarity measures between axioms, and (ii) two alternative dimensionality reduction techniques to check the extent to which the considered similarities allow to separate true axioms from false axioms. The result of the dimensionality reduction process is subsequently fed to several learning algorithms, comparing the accuracy of all combinations of similarity, dimensionality reduction technique, and classification method. As a result, it is observed that it is not necessary to use sophisticated semantics-based similarity measures to obtain accurate predictions, and furthermore that classification performance only marginally depends on the choice of the learning method. Our results open the way to implementing efficient surrogate models for axiom scoring to speed up ontology learning and schema induction methods

    Reconstructing Words from Right-Bounded-Block Words

    Full text link
    A reconstruction problem of words from scattered factors asks for the minimal information, like multisets of scattered factors of a given length or the number of occurrences of scattered factors from a given set, necessary to uniquely determine a word. We show that a word w{a,b}w \in \{a, b\}^{*} can be reconstructed from the number of occurrences of at most min(wa,wb)+1\min(|w|_a, |w|_b)+ 1 scattered factors of the form aiba^{i} b. Moreover, we generalize the result to alphabets of the form {1,,q}\{1,\ldots,q\} by showing that at most i=1q1wi(qi+1) \sum^{q-1}_{i=1} |w|_i (q-i+1) scattered factors suffices to reconstruct ww. Both results improve on the upper bounds known so far. Complexity time bounds on reconstruction algorithms are also considered here

    The Settlement of Madagascar: What Dialects and Languages Can Tell Us

    Get PDF
    The dialects of Madagascar belong to the Greater Barito East group of the Austronesian family and it is widely accepted that the Island was colonized by Indonesian sailors after a maritime trek that probably took place around 650 CE. The language most closely related to Malagasy dialects is Maanyan, but Malay is also strongly related especially for navigation terms. Since the Maanyan Dayaks live along the Barito river in Kalimantan (Borneo) and they do not possess the necessary skill for long maritime navigation, they were probably brought as subordinates by Malay sailors. In a recent paper we compared 23 different Malagasy dialects in order to determine the time and the landing area of the first colonization. In this research we use new data and new methods to confirm that the landing took place on the south-east coast of the Island. Furthermore, we are able to state here that colonization probably consisted of a single founding event rather than multiple settlements.To reach our goal we find out the internal kinship relations among all the 23 Malagasy dialects and we also find out the relations of the 23 dialects to Malay and Maanyan. The method used is an automated version of the lexicostatistic approach. The data from Madagascar were collected by the author at the beginning of 2010 and consist of Swadesh lists of 200 items for 23 dialects covering all areas of the Island. The lists for Maanyan and Malay were obtained from a published dataset integrated with the author's interviews

    Contextual and Behavioral Customer Journey Discovery Using a Genetic Approach

    Get PDF
    With the advent of new technologies and the increase in customers’ expectations, services are becoming more complex. This complexity calls for new methods to understand, analyze, and improve service delivery. Summarizing customers’ experience using representative journeys that are displayed on a Customer Journey Map (CJM) is one of these techniques. We propose a genetic algorithm that automatically builds a CJM from raw customer experience recorded in a database. Mining representative journeys can be seen a clustering task where both the sequence of activities and some contextual data (e.g., demographics) are considered when measuring the similarity between journeys. We show that our genetic approach outperforms traditional ways of handling this clustering task. Moreover, we apply our algorithm on a real dataset to highlight the benefit of using a genetic approach
    corecore